home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
The PC-SIG Library 9
/
The PC-SIG Library on CD ROM - Ninth Edition.iso
/
001_100
/
DISK0088
/
DISK0088.ZIP
/
PRINTDOC
< prev
next >
Wrap
Text File
|
1987-05-04
|
50KB
|
1,037 lines
1
EPISTAT
Statistical Package
for the IBM Personal Computer
Version 3.3
Written by:
Tracy L. Gustafson, M.D.
Copyright 1986
2
INTRODUCTION
EPISTAT is a collection of programs written in BASICA for
statistical analysis of small to medium-sized data samples ( < 28
samples or variables and < 2000 total data entries per file).
The 25 programs in EPISTAT perform more than 40 common statistical
tests or functions and provide utilities for data entry, editing,
printing, graphing, sorting, selecting, transforming and crosstabs.
The programs are intended to be as self-explanatory and user-
friendly as possible. You do not need to memorize this guide
before using the programs. On the other hand, neither the programs
nor this manual purport to TEACH the proper use or interpretation
of statistics. The user must have some familiarity with the kinds
of data required and the underlying assumptions appropriate to each
statistical test.
For further explanations of tests, refer to:
1. Colton, Theodore. Statistics in Medicine. Little, Brown and Co.
Boston, 1974.
2. Fleiss, Joseph. Statistical Methods for Rates and Proportions.
John Wiley and Sons. New York, 1981.
3. Rosner, Bernard. Fundamentals of Biostatistics. Prindle Weber and
Schmidt. Boston, 1982.
4. Snedecor, George W. and Cochran, William G. Statistical Methods.
Iowa State Univ. Press. Ames, Iowa, 1978.
5. Schlesselman, James. Case-Control Studies. Oxford Univ. Press.
New York, 1982.
6. Zar, Jerrold. Biostatistical Analysis. Prentice-Hall. Englewood
Cliffs, New Jersey. 1984.
CAVEAT:
These programs have been tested extensively, but I cannot
guarantee that they will work correctly with every possible data set.
Incorrect results are usually due to errors in format or type of
data entered. If you believe you have discovered an error in the
programs, please write me. I intend to correct any bugs that are
brought to my attention.
It is good practice to regularly compare the results obtained
by programs in EPISTAT with results obtained by your previous method
of calculation. ANY unexpected result should be questioned and
double-checked by reference to tables or another method of
calculation.
3
INDEX TO EPISTAT
The following statistical tests and functions are available:
TEST or FUNCTION PROGRAM NAME
---------------- ------------
Analysis of variance (1 and 2-way)...................ANOVA
Bayes' theorem.......................................BAYES
Binomial distribution................................BINOMIAL
Chi-square test and distribvtion.....................CHISQR
Correlation coefficients.............................CORRELAT
F distribution.......................................ANOVA
Fisher's exact test..................................FISHERS
Linear regression analysis...........................LNREGRES
Mantel-Haenszel Chi-square test......................MHCHISQR
Mantel-Haenszel for multiple controls................MHCHIMLT
McNemar's test.......................................MCNEMAR
Mean, median and standard deviation..................DATA-ONE
Normal distribution..................................NORMAL
Poisson distribution.................................POISSON
Random sample generator..............................RANDOMIZ
Rank sum test........................................RANKTEST
Rates adjusted (direct and indirect).................RATEADJ
Sample size calculations..........,..................SAMPLSIZ
Signed rank test.....................................RANKTEST
Student's T-test and T distribution..................T-TEST
The following data-handling capabilities are provided:
DATA MANIPULATION PROGRAM NAME
----------------- ------------
Determine best test and program names................EPISTAT
Graph histograms.....................................HISTOGRM
Graph scattergrams...................................SCATRGRM
Perform data transformations.........................LNREGRES
Print data (sorted or input order)...................DATA-ONE
Print crosstab reports...............................XTAB
Select specific records..............................SELECT
Transfer data between EPISTAT files..................FILETRAN
Transfer data from FORTRAN to EPISTAT files..........FORTRANS
4
SYSTEM REQUIREMENTS FOR EPISTAT
MINIMUM OPTIMAL
IBM PC with 64K RAM IBM PC with 96K RAM
One 160K disk drive Two 320K disk drives
Monochrome monitor Color graphics adapter
BASICA Hi-res color monitor
BASICA
IBM, Epson, Okidata, or
C. Itoh Prowriter printer
with graphics capability
OVERALL PROGRAM DESCRIPTION
All calculations in EPISTAT are performed using single precision.
Although it may first appear that double precision would be more
appropriate for statistical tests, "double" precision makes little or
no real improvement in the accuracy of these programs. For best
results, data entries should be numbers between 1E+7 and 1E-7. Larger
or smaller numbers should be multiplied by an appropriate power of 10
before entry and analysis in EPISTAT.
All EPISTAT programs are written so that as much pertinent
information about the test as possible can fit on the final screen.
This feature allows a summary printed copy to be produced simply by
pressing <Shift-PrtSc>. This will work any time there is a pause in
the program display. Six programs, "DATA-ONE", "HISTOGRM", "RANDOMIZ",
"SCATRGRM", "SELECT", and "XTAB" produce printed reports without using
<Shift-PrtSc>. In these, follow program instructions to route output
to your printer.
EPISTAT is the introductory program in the EPISTAT package.
DATA-ONE is the major data entry, editing, and printing program. Most
of the programs in EPISTAT can evaluate data entered and saved using
DATA-ONE. Many of the programs can, in addition, evaluate summary
data. The programs marked with a star (*) below can evaluate data
entered in DATA-ONE. Non-starred programs provide their own data entry
routines.
The EPISTAT disk should be placed in drive A (or other default
drive) when loading any program because "EPIMRG" and "EPISETUP.DAT" are
used by every program. Once a program is running, EPISTAT can be
removed from drive A if necessary.
5
INDIVIDUAL PROGRAM DESCRIPTIONS
(1) "EPISTAT"
This introductory program lists the available programs. It also aids
the user in selecting the best statistical test. To do so, choose menu
option 2 and decide whether you are interested in tests for a single
sample, tests for 2 or more samples, other statistical functions, or data
handling utilities.
You are also allowed to specify hardware configuration and colors for a
color monitor. Choose colors 7,0,0 if you have a monochrome monitor
connected to the color/graphics adapter. If yours is not one of the
listed printers, check your printer's codes for the typeface you want.
For example, the code for elite type on the Prowriter is ESC "E". If you
press Escape then E, the display will show the decimal ASCII codes: 27
69. An alternate method is to press <Alt> and enter the decimal code on
the numeric keypad. Press <Enter> when the complete code is entered.
"DATA-ONE" *
A. DATA ENTRY:
This is the central keyboard data entry program for the EPISTAT package
(for non-keyboard data entry, see FILETRAN and FORTRANS). Initial data
entry (Option 1) first asks you to name your samples or variables. Then
type in the data, pressing <Enter> after each entry. Press the TAB key
to back up one or two items on the SAME ROW. The maximum number of
samples or variables (S) allowed is 28 with a color adapter and 7 with a
monochrome adapter. The maximum number of records in each sample is
2000/S. A missing value can be entered by pressing <Enter> only. Note
that this is different than entering a zero (0). To exit, press key F10.
The mean, median and (n-1) standard deviation are then displayed. When
you return to the main menu, SAVE your datafile to disk (Option 5) for
future modification or use by other programs in the EPISTAT package.
Although all entries in a datafile are treated as numbers by
DATA-ONE, it is possible to enter characters (names) in a record.
Characters will be treated as zeros in calculations. Nevertheless, it
improves data readability to use the "Sample 1" column for record or case
names. Thus, DATA-ONE allows one to specify a name for each column
(variable) and each row (case) in the datafile.
B. DATA MODIFICATION:
APPEND (Option 2) allows one to add more observations to a sample at
a later session. EDIT (Option 3) allows one to delete or replace
incorrect data entries and to change sample or variable names. When you
return to the main menu, SAVE modified data to disk again.
6
C. PRINTING DATA:
To view or review a datafile, a printout to screen or printer can
be selected (Option 4). To print a datafile exactly as it was keyed in,
request the printout in INPUT order. DATA-ONE can also print the data
SORTED by any selected sample. Only numeric data is sorted by DATA-ONE,
so it will not alphabetize a character field. Blank records are not
sorted, either.
D. SAVING DATAFILES and LOADING DATAFILES:
SAVING data (Option 5), writes your data to disk in a sequential
file for later editing, review, or use by another program. DATA MUST BE
SAVED TO DISK before it can be used by other programs in EPISTAT. Since
EPISTAT must be in drive A: (or other default drive) to begin, you will
probably want to SAVE datafiles on drive B. To do so, precede each
datafile name with B: (e.g. B:TESTDATA). Do not enclose filenames in
quotation marks.
(3) "ANOVA" *
A. ONE-way ANOVA:
PURPOSE: To compare the means of 3 or more samples.
DATA REQUIRED: A DATA-ONE datafile with 3 or more columns/variables.
EXAMPLE: Are the mean ages of three groups of individuals
significantly different?
COMMENT: Sample means, (n-1) variances, the mean variance and the
variance of the means are displayed. Total sum of squares,
Treatment sum of squares and Error sum of squares are also
shown. Finally the F value, degrees of freedom (df) in the
numerator and df in the denominator and p value are given.
B. TWO-way ANOVA:
PURPOSE: To evaluate the combined effects of 2 variables on a third
variable (ROW and COLUMN effects).
DATA REQUIRED: A DATA-ONE datafile with at least 2 columns and 2 rows.
EXAMPLE: How much of the variance in transparency of glass types is
attributable to the kind of sand and how much to the process
used to make it?
COMMENT: All samples in two-way ANOVA must have the same number of
elements. Sample means, (n-1) variances, Total sum of
squares, Row sum of squares, Column sum of squares and
Residual are all displayed. The F value, df in numerator,
df in denominator and corresponding p values are shown for
both the Row and Column effects.
C. F-value:
PURPOSE: To evaluate the p value associated with a known F value.
DATA REQUIRED: F value, df in numerator, and df in denominator.
REFERENCE: Snedecor, pp. 258-338.
7
(4) "BAYES"
A. Probabilities of false positive and false negative tests:
PURPOSE: To evaluate a test or procedure in terms of its sensitivity
and specificity.
DATA REQUIRED: Sensitivity and specificity of a test in relation to
a specific condition it tests for. The estimated incidence of
this condition in the population being tested.
EXAMPLE: If a test has a specificity of .99 and a sensitivity of .99,
how many false positives will occur in a population where the
incidence of this disease is only 100/10,100 ?
Answer: 99% of positives will be false positives.
B. Probability of disease given a positive test:
PURPOSE: To determine the most likely disease given a certain positive
test.
DATA REQUIRED: The estimated incidence of several diseases in the test
population. (Use `OTHER' as the last disease so that the sum
of all percentages is 100). The probability of a positive
test in people known to have each disease (test sensitivities).
EXAMPLE: If antithyroid antibodies are found in patients with diabetes,
thyroiditis and other diseases, what is the a priori
probability of each diagnosis given a positive test? This
will vary as the relative incidence of these diseases varies
in the test population.
COMMENT: Although the examples deal with the use of medical tests, the
same statistical test applies to the relation of any test for
any condition.
REFERENCE: Fleiss, p. 5.
(5) "BINOMIAL"
PURPOSE: The binomial distribution allows calculation of the probability
of an observed number compared to a known expected.
DATA REQUIRED: A dichotomous variable that has an equal probability of
occurring in each of N trials.
EXAMPLE: What is the chance of obtaining 2 or fewer heads in 10 tosses
of a fair coin?
Answer: p = .055
COMMENT: BINOMIAL calculates the ONE-tailed probability of the observed
number and all more extreme situations. For example the
ONE-tailed probability of 2 heads in 10 tosses of a coin is the
sum of the probabilities for 0,1 and 2 heads.
REFERENCE: Colton, p. 151.
8
(6) "CHISQR"
A. Table of data:
PURPOSE: The Chi-square program evaluates a possible relationship
between the row variable and the column variable.
DATA REQUIRED: The counts for each cell of the table.
EXAMPLE: Is there a relationship between race and socioeconomic group?
COMMENT: 2 by 2 tables are evaluated using Yates' correction and the
odds ratio and its confidence limits are calculated using
Cornfield's method.
B. Chi-square value:
PURPOSE: To evaluate the p value associated with a known X-square value.
DATA REQUIRED: The chi-square value and the degrees of freedom.
C. Chi-square test for trend:
PURPOSE: To evaluate a possible directional relationship between the
row variable and the column variable. If the row is exposure
level and the column is outcome, the relationship is called a
`dose-response.'
DATA REQUIRED: A number that describes each `exposure level'. (If they
are not quantifiable, just use consecutive numbers.) The
number of cases and controls at each exposure level.
EXAMPLE: Is the risk of lung cancer directionally related to the
number of pack-years of smoking?
REFERENCE: Schlesselman, p. 175,177.
(7) "CORRELAT" *
A. Pearson's correlation coefficient:
PURPOSE: To assess the linear relationship between two variables.
DATA REQUIRED: A DATA-ONE datafile containing the two samples/variables
of interest.
EXAMPLE: How closely do age and blood pressure correlate?
COMMENT: The correlation coefficient is calculated and then tested
using the Student's T distribution for the probability that
such a correlation would occur by chance.
B. R value:
PURPOSE: To evaluate the p value associated with a known R value.
DATA REQUIRED: The R value and the number of observations in the sample
from which it came.
C. Spearman's rank correlation:
PURPOSE: To assess the relationship between two variables that are not
normally distributed (and only a small sample is available).
DATA REQUIRED: A DATA-ONE datafile containing the 2 variables of interest.
EXAMPLE: How closely do infant's ages at death correlate
with birthweight?
COMMENT: The correlation coefficient is calculated but associated
p values are not calculated.
REFERENCE: Colton, p. 212.
9
(8) "FILETRAN" *
PURPOSE: To transfer a sample or column of data from one EPISTAT
datafile to another. This makes it unneccesary to re-enter
data, even if you need to compare 2 samples that are in separate
datafiles, or you have a data set with more than 28 variables
that you split between two or more datafiles. You may
create a new datafile by selecting one sample from DATAFILE #1
and another from DATAFILE #2. FILETRAN can also combine two
samples by APPENDING one to the other.
DATA REQUIRED: Two DATA-ONE datafiles. First enter the datafile you
with to replace, add or append a sample TO. Then enter the
datafile you wish to transfer data FROM. After the data
sample has been added, you may save the data under the original
filename, or create a new datafile with the additional data
in it. You may also cancel the file modification if you find
you have made an error.
EXAMPLE: You performed the same experiment on two different days and
analyzed the results separately. Now you want to combine the
results of both experiments and analyze the combined data
set. FILETRAN will allow you to append the two files together
and save that data under a new filename.
COMMENT: If you want to append several columns of data from one ·
datafile to another, do not return to the main menu until all
columns have been appended. Exiting between appending will
leave large blank spaces in the file.
(9) "FISHERS"
PURPOSE: Fisher's exact test evaluates 2 by 2 tables of discrete
variables.
DATA REQUIRED: The counts for each of 4 cells of the table.
EXAMPLE: Is there a relationship between being bald and dying of
coronary heart disease?
COMMENT: Fisher's exact test is particularly valuable when the
Chi-square test is inappropriate because the expected value
for a cell is less than 5. However, this program can
evaluate some tables where A+B+C+D > 200.
10
(10) "FORTRANS"
PURPOSE: To transfer data from an SDF, FORTRAN, or sequential card
image file into EPISTAT DATA-ONE format.
DATA REQUIRED: A sequential card image file of equal-length records
each delimited by a carriage return and line feed. The
end of file must be marked by a CHR(26). You must know the
record length (including spaces, but NOT including the carriage
return and line feed at the end of each line), the beginning
column number and width of each data item you want to transfer.
If your datafile contains understood (but not marked) decimal
places, then enter the number of decimal places. If your
datafile contains marked decimal places, then enter 0 for
(understood) decimal places. Finally, specify a missing value
code like 9999. If you have no missing values, then enter a
code that does not occur in your data set.
EXAMPLE: You have a FORTRAN file on the mainframe with 10 years worth
of data. You can select a subset of that data from a 6-month
period and read that into EPISTAT for some pilot analyses
before using mainframe time to analyze the entire data set.
COMMENT: FORTRANS can be used to extract selected data items from
DBASE(R) "SDF" type files and from LOTUS(R) "PRN" print files.
Be sure to first look at the datafile you create from DBASE or
LOTUS with your word processor in non-document mode to be sure
that all records are of equal length and that you know which
columns contain which data items. Some programs add extra
spaces here and there when creating an SDF file. FORTRANS
will not successfully read a datafile with more that 255
columns of data in each record.
(11) "HISTOGRM" *
PURPOSE: To graph a data sample according to user specifications in the
form of a histogram on the high resolution graphics screen.
DATA REQUIRED: A DATA-ONE datafile. The full name of the variable to
be graphed, its units, and the width of each cell in the
histogram.
EXAMPLE: What is the distribution of scores on the last exam?
COMMENT: You determine the appearance of the report by entering a label
for the horizontal axis and the interval width. To obtain a
printed copy on the IBM, Epson, Okidata or Prowriter printer
(specified in "EPISTAT" when you setup) press key F1. Press
F10 to return to the program.
11
(12) "LNREGRES" *
A. Linear regression:
PURPOSE: To calculate the least-squares regression line for paired
samples.
DATA REQUIRED: A DATA-ONE datafile and the sample numbers of the
predictor and dependent variables.
EXAMPLE: What is the regression line relating IQ to income?
COMMENT: The regression line is displayed in the form Y = b + aX.
The T distribution is applied to determine if the calculated
slope is significantly different than zero. The T value,
degrees of freedom and p value are shown.
REFERENCE: Colton p. 199.
B. Data transformations:
PURPOSE: To change a data set in a regular way, either to normalize
it or to identify a non-linear relationship between two
variables.
DATA REQUIRED: A DATA-ONE datafile with fewer than 28 variables in it.
EXAMPLE: In my sample, IQ and income were not linearly related, so I
will try a transformation to see if they are related
logarithmically.
COMMENT: Nine transformations are available:
1. Ax + B 6. A * ln(x) + B
2. A(x)squared + B 7. ln(x/(100-x))
3. A*square root(x) + B 8. Sample A + Sample B
4. A/x + B 9. Sample A * Sample B
5. x - mean
Specify the value for A and B and the program will apply that
formula to each value in the sample you want transformed. It
then adds this transformed sample to the datafile as an
additional column/variable. You may save the new datafile
containing this transformed variable under the old name or
under a new datafile name as you choose.
(13) "MHCHISQR"
PURPOSE: To evaluate the relationship between two discrete variables
while controlling for the effect of a third variable.
DATA REQUIRED: The names of the factors you wish to test for and control
for as well as the counts of cases and controls that have and
do not have the test and control variables. This is the
equivalent of a series of 2 by 2 tables, one for each category
of the control variable.
EXAMPLE: Is there a relationship between smoking and lung cancer,
controlled for occupation?
COMMENT: The factor you are testing must be dichotomous, but the control
variable may have more that 2 categories. The Chi-square value,
degrees of freedom, and p value are displayed. Also shown
are an odds ratio and 95% confidence limits on the odds ratio.
REFERENCE: Schlesselman, pp. 183,206.
12
(14) "MHCHIMLT" *
PURPOSE: To evaluate the relationship between cases and controls and a
test factor when each a case is matched with 2 or more controls.
DATA REQUIRED: A DATA-ONE datafile or manually entered summary data. If
using DATA-ONE, a case sample and a 2 or more control samples
should be present. Data is coded as "1" for factor present,
and "0" for factor absent in each case and control sample.
EXAMPLE: Is there a relationship between illness and eating raw potatoes?
COMMENT: The Chi-square value, degrees of freedom and p value are
displayed. Also shown are an odds ratio and 95% confidence
limits on the odds ratio. This test does not apply if each
case is matched with a different number of controls.
REFERENCE: Fleiss, p. 125.
(15) "MCNEMAR"
PURPOSE: Also called a paired Chi-square test, McNemar's test evaluates
a relationship between two variables by analyzing the number
of discordant PAIRS.
DATA REQUIRED: The name of the factor being tested in CASES and CONTROLS
and the number of pairs that belong in each of 4 cells.
EXAMPLE: In twins in which one developed a stroke and the other did not,
is there a relationship between high-fat diet and stroke?
COMMENT: The Chi-square value is calculated using Yates correction, and
degrees of freedom and p value are displayed. Also shown are an
odds ratio and 95% confidence limits on the odds ratio.
REFERENCE: Schlesselman, p. 210.
(16) "NORMAL" *
A. Comparing a sample mean to the population mean:
PURPOSE: To see if your sample mean is different from a known population.
DATA REQUIRED: A DATA-ONE datafile and a known population mean.
EXAMPLE: Is the mean blood pressure in my sample statistically different
from the U.S. population mean?
COMMENT: The mean for the sample and the p value are displayed.
B. Percent of test values in a given range:
PURPOSE: To determine the percent of sample values that will fall between
two values in a normally distributed population.
DATA REQUIRED: The mean and standard deviation of the population being
sampled. The upper and lower limits of the range in question.
EXAMPLE: If the population mean height is 70 inches and the standard
deviation is 3 inches, what proportion of the population are
at least 65 inches but no more than 73 inches tall?
Answer: 79.4 % of the population.
C. Z value:
PURPOSE: To evaluate the p value associated with a known Z value.
DATA REQUIRED: The known Z value.
COMMENT: A two-tailed p value is returned.
13
(17) "POISSON"
PURPOSE: To determine the probability of a certain number of cases or
events, when the expected rate is known but the number of
times when the case or event did not occur cannot be counted.
DATA REQUIRED: The number of cases observed and the expected number of
cases (calculated as expected rate * time interval).
EXAMPLE: Is it unusual for lightning to strike 5 people in one county
this year, given that in the last 5 years lightning has struck
only 8 people in this county?
Answer: p = .024
COMMENT: The ONE-tailed probability of observing the given number AND
all more extreme cases is displayed.
(18) "RANDOMIZ"
A. Survey sample:
PURPOSE: To provide a series of random numbers to aid in selecting a
survey sample from a large number of possible respondents.
DATA REQUIRED: The smallest number and the largest number you want,
and the number of random numbers between those values you
want selected.
EXAMPLE: I want to survey 100 individuals from the pages of the
telephone book. The telephone book has 700 pages so I will
ask for 100 numbers between 1 and 700 and then phone the
tenth person on each of the randomly selected pages.
B. Unpaired case-control sample:
PURPOSE: To assign subjects to two equal groups randomly.
DATA REQUIRED: The total number of subjects in the study.
EXAMPLE: Assign 50 patients to receive drug A and 50 to receive drug B.
COMMENT: You are also asked if subjects will enter the study over a
period longer than one month. If so, you are warned that in
many studies it is preferable to randomize each month's cases
independently, so that seasonal biases do no creep in.
C. Paired case-control sample:
PURPOSE: To assign members of pairs to case and control groups randomly.
DATA REQUIRED: The total number of pairs. You must also decide on an
objective way of deciding which one of each pair is #1 and
which is #2.
EXAMPLE: Assign 20 pairs of patients to case and control groups randomly.
COMMENT: Consecutive order of patients admitted to the hospital is not
always a satisfactory method of deciding which of each is #1
and which is #2. Alphabetic criteria, day of week, or other
criteria entirely beyond the investigator's control are usually
better.
REFERENCE: Colton, p.259.
14
(19) "RANKTEST" *
A. Rank sum test:
PURPOSE: To evaluate the difference between two unpaired non-parametric
samples. Comparable to the unpaired T-test for normally
distributed samples. It also specifically applies when
quantitative variables are not available but qualitative
ranks are.
DATA REQUIRED: A DATA-ONE datafile or the number of observations in each
of two samples and the sum of ranks for the first sample.
EXAMPLE: Is the duration of remission different for leukemia patients
treated with regimen #1 compared regimen #2? Duration of
remission is measured in months and 8 cases and 10 controls
have been followed for 5 years.
COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are
displayed for both groups. The two-tailed exact p value is
then calculated. For large samples ( N1+N2 > 24 ), the normal
approximation is used to calculate probabilities. Note that
even non-parametric samples larger than 30 can often be
evaluated with parametric tests like the T-test (the central
limit theorem).
B. Signed rank test:
PURPOSE: To evaluate the difference between two paired non-parametric
samples. Comparable to the paired T-test for normally
distributed samples. It also specifically applies when
quantitative variables are not available but qualitative
ranks are.
DATA REQUIRED: A DATA-ONE datafile or the number of non-zero differences
ranked and the sum of negative and then sum of positive-signed
ranks.
EXAMPLE: For paired rats from the same litter, does extra dietary
vitamin E shorten the time it takes to complete a maze?
COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are
displayed for both groups. The two-tailed exact p value is
then calculated. However, for large samples ( N > 20 ),
the normal approximation is used to calculate probabilities.
REFERENCE: Colton, pp. 219-222.
15
(20) "RATEADJ" *
A. Direct rate adjustment:
PURPOSE: To adjust a rate to a standard population for comparison
to other published rates.
DATA REQUIRED: A DATA-ONE datafile that includes one sample containing
the study rates to by adjusted (e.g. the rate in each age
group if age-adjusting). A second sample must contain the
standard population counts for the same groups. Rates in the
first sample may use any denominator (per 1000, per million,
etc), as you supply that denominator at the time of the
calculation.
EXAMPLE: Studying bladder cancer in Eskimos, you want to age-adjust
to the standard U.S. population to compare to other studies.
COMMENT: Direct adjustment may not be appropriate if the number of
cases in any one cell is fewer than 5.
B. Indirect rate adjustment:
PURPOSE: To adjust sample observations to to a standard population rate
for comparison to other published rates.
DATA REQUIRED: A DATA-ONE datafile that includes one sample containing
the number of cases observed in the study. A second sample
must contain the standard population rates for the same
groups. The standard population rates may use any denominator
(per 1000, per million, etc), as you supply that denominator
at the time of the calculation.
EXAMPLE: Studying bladder cancer in Eskimos, you find only 2 or 3 cases
in several of the younger age groups. You want to age-adjust
to standard U.S. population rates to compare to other studies.
COMMENT: In addition to age-adjusting, RATEADJ will calculate the
probability of observing the number of cases (total) that you
observed in your study. Enter the number observed and the
Expected number will be displayed as well as the one-tailed
POISSON probability of this outcome. The adjusted rate is
displayed in the form: ` X times the standard population rate.'
REFERENCE: Colton, pp. 47-51.
16
(21) "SAMPLSIZ"
A. Survey sample size:
PURPOSE: To determine the sample size required to for a survey sample.
DATA REQUIRED: The approximate size of the population from which
you plan to draw the sample, your estimate of the rate of the
study characteristic (the result of your study), the accuracy
you require, and the z(alpha) level you wish to test.
EXAMPLE: What sample size is required to determine the immunization
levels in 2 year olds within 1% of the true value, given that
there are 100,000 2 year-olds in the state, and we believe that
95% are immunized? Let z(alpha) correspond to 95% certainty.
Answer: N = 1792
COMMENT: TP = total population pi = population proportion
d = maximum acceptable error in sample proportion
n = [ z(a)*SQR(pi*(1-pi)) / d ] squared and N = n / (1+n/TP)
B. Sample size for a paired case-control study:
PURPOSE: To determine the number of cases and controls required for a
paired case-control study.
DATA REQUIRED: An estimate of the population rate of the study
characteristic, the smallest difference you wish to be able to
detect, and the z(beta) and z(alpha) levels of certainty you
require.
EXAMPLE: Paired rats are fed a normal diet plus or minus a suspected
carcinogen. How many rat pairs must be studied to detect a
1% increase in the population cancer rate of 3% , given that
z(beta) = 90% and z(alpha) = 95% ?
Answer: N = 3429
COMMENT:
N = [(z(a)*SQR(pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT))) / (PT-pi)] squared
REFERENCE: Colton, p. 161.
C. Sample size for an unpaired case-control study:
PURPOSE: To determine the number of cases and controls required for an
unpaired case-control study.
DATA REQUIRED: An estimate of the Control group rate (used as the
population rate), whether the test group will be higher or lower
than the controls, the smallest difference you wish to be able to
detect, and the z(beta) and z(alpha) levels of certainty you
require.
EXAMPLE: How many case and control animals should be studied to determine
if a new antibiotic cures cattle disease 10% better than current
standard therapy? Current therapy cures 70% of animals. Let
z(beta) = 90% and z(alpha) = 95%.
Answer: 392 cases and 392 controls.
COMMENT:
[(z(a)*SQR(2*pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT)+PC*(1-PC))]
N = [-----------------------------------------------------------] squared
(PT - PC)
REFERENCE: Fleiss, p 41 and Schlesselman, p. 168.
17
(22) "SCATRGRM" *
PURPOSE: To graph the relationship between paired variables according to
user specifications on the high resolution graphics screen. To
display the linear regression line.
DATA REQUIRED: A DATA-ONE datafile containing two paired variables. The
minimum and maximum values in each variable are displayed. You
specify the labels and units to be printed on horizontal and
vertical axes. Then enter an interval width for each variable.
EXAMPLE: Graph the relationship between advertising expenditures and
gross sales based on the last 10 years of experience at
Company A.
COMMENT: Be sure to pick an interval width that will result in 20 or
fewer intervals on the vertical, and 60 or fewer intervals on
the horizontal axis. To display the linear regression line
press key F5. The formula for this regression line is
displayed in LNREGRES (number 12 above). To obtain a printed
copy on the IBM, Epson, Okidata or Prowriter (specified in
"EPISTAT"), press key F1. Press key F10 to return to the
program.
(23) "SELECT" *
PURPOSE: To select a subset of a datafile based on user specifications.
Data can be selected for printing, or to create a new datafile
on disk.
DATA REQUIRED: A DATA-ONE datafile and knowledge of the selection
criteria you want to apply. One can select on any variable
with "AND" and "OR" specifications. As many as 10 selection
criteria can be set at one time. SELECT assumes that "AND"s
are in parentheses. For example:
"SELECT IF Sample #1>10 AND Sample #2=1 OR Sample #1<Sample #3"
is interpreted as meaning:
"SELECT IF (Sample #1>10 AND Sample #2=1) OR Sample #1<Sample #3"
EXAMPLE: You have a datafile containing all of the quality control
results for a particular machine part this month. You want a
new file created which contains only those parts that failed
specifications. You may select all the samples that exceed
quality criteria.
18
(24) "T-TEST" *
A. Paired and unpaired T-test:
PURPOSE: To determine if the means of two samples are statistically
different.
DATA REQUIRED: A DATA-ONE datafile with the two samples to be compared.
If a paired test is being performed, both samples must contain
the same number of items.
EXAMPLE: Is the mean weight gain of a herd fed on new Brand X
significantly greater than the weight gain of a second herd
fed the standard brand feed?
COMMENT: The means and variances of the two samples will be displayed,
followed by the T value, degrees of freedom, and the p value.
For the unpaired T-test, the equality of variances is tested
to be sure that the assumptions of the T-test are met. If
the variances are statistically different, the F value
supporting that conclusion will be displayed. The confidence
limits on the difference between the two values are also
displayed.
REFERENCE: Snedecor, p. 116.
B. T value:
PURPOSE: To evaluate the p value associated with a given T value.
DATA REQUIRED: The T value and the degrees of freedom.
(25) "XTAB" *
PURPOSE: To crosstabulate data in 1,2 or 3-way reports. This provides
the tabular couterpart of a scattergram.
DATA REQUIRED: A DATA-ONE datafile containing at least as many variables
as the number of ways you want to crosstabulate. The minimum
and maximum values for each sample will be displayed and then
you choose the interval width for each cell of the table. If
you have coded data with sequential integers, choose a width
of 1. If you have quantitative data, it is usually best to
choose and interval that will result in fewer than 10 cells or
the table will be difficult to read. In addition to choosing
the interval, you are offerred the opportunity to label each
row and column interval with the label of your choice to make
a more readable report.
EXAMPLE: What is the age by sex breakdown of hospitalized cases of
meningitis?
COMMENT: The crosstab report is printed on screen or printer. The
number of missing values displayed is the number of cases
where one or more of the samples involved contained a blank.
19
THE EXAMPLE DATAFILE
An example datafile, named "EXAMPLE", showing a sample of people,
their ages and their systolic blood pressures, is included on the EPISTAT
disk. To gain some familiarity with the appearance of an EPISTAT
datafile, follow these steps:
1.) Press <Ctrl> and <Alt> and <Del> at the same time (or load BASICA,
then type RUN "EPISTAT") to run the introductory program. Do not change
the default configuration for now, but move on to the main menu.
2.) Choose Menu option 3 to run specific programs in the EPISTAT package.
3.) Choose program number 2 to run "DATA-ONE", the main data entry and
printing program in EPISTAT.
4.) Choose Menu option 6 to load data from disk. Then enter the filename
EXAMPLE without any quotation marks.
5.) Return to the main DATA-ONE menu and choose option 4 to print this
datafile on your screen or printer. Print it once in input order,
then try printing it sorted by Sample 2 or 3.
6.) Choose menu option 7 to exit DATA-ONE ,then enter Y because EXAMPLE
was already saved to disk. Choose other EPISTAT program numbers to
run ANOVA, HISTOGRM, LNREGRES, SCATRGRM, or XTAB with this datafile.
7.) Return to DATA-ONE to enter your own data for analysis.
20
NOTICE
---------------------------------------------------------------------
Users may copy EPISTAT and distribute it to others on the following
conditions:
1. The programs are not modified in any way.
2. Individual programs are not distributed separately.
3. No fee is charged for copying or distribution.
---------------------------------------------------------------------
====USER-SUPPORTED SOFTWARE====
The concept of user-supported software is based on three
principles:
1. The value and utility of a software package is best assessed
by each user on his or her own system with his or her own data.
Only after using a program can one determine whether it serves
one's personal applications, needs, and tastes.
2. The creation of independent personal computer software requires
a substantial commitment of time and effort. Rather than
replicate this effort time after time, the computing community
can and should support individual creative efforts.
3. By encouraging users to copy programs, rather than spending
large sums on copy-protection, authors can supply quality
software at reduced cost. Users will support useful programs.
If after using EPISTAT, you find it of value, your contribution
in any amount will be appreciated ( $25 suggested ). If you are
interested in a more sophisticated statistical package, write or
call about the new TRUE EPISTAT.
Send contributions to:
Tracy L. Gustafson, M.D.
2011 Cap Rock Circle
Richardson, Texas 75080
214-680-1376
Thank you.